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SYMPLECTIC RUNGE-KUTTA SCHEMES FOR ADJOINT EQUATIONS, 
AUTOMATIC DIFFERENTIATION, OPTIMAL CONTROL AND MORE * 

J.M. SANZ-SERNAt 


Abstract. The study of the sensitivity of the solution of a system of differential equations with respect to 
changes in the initial conditions leads to the introduction of an adjoint system, whose discretisation is related to 
reverse accumulation in automatic differentiation. Similar adjoint systems arise in optimal control and other ai'eas, 
including classical Mechanics. Adjoint systems are introduced in such a way that they exactly preserve a relevant 
quadratic invariant (more precisely an inner product). Symplectic Runge-Kutta and Partitioned Runge-Kutta meth¬ 
ods are defined through the exact conseiwation of a differential geometric stmcture, but may be chai'acterized by the 
fact that they preserve exactly quadratic invariants of the system being integrated. Therefore the symplecticness (or 
lack of symplecticness) of a Runge-Kutta or Partitioned Runge-Kutta integrator should be relevant to understand its 
perfoiTnance when applied to the computation of sensitivities, to optimal control problems and in other applications 
requiring the use of adjoint systems. This paper examines the links between symplectic integration and those ap¬ 
plications. The article presents in a new, unified way a number of results now scattered or implicit in the literature. 
In particular we show how some common procedures, such as the direct method in optimal control theory and the 
computation of sensitivities via reverse accumulation, imply, probably unbeknownst to the user, ‘hidden’ integrations 
with symplectic Partitioned Runge-Kutta schemes. 
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I. Introduction. Symplectic Runge-Kutta (RK) [24], [31], [39] and Partitioned Runge- 
Kutta (PRK) [1], [40] formulae were introduced to integrate Hamiltonian systems in long 
time intervals. They are defined in terms of a purely geometric property, the conservation 
of the symplectic structure, and provided the hrst widely studied instance of what was later 
termed geometric integration [32]. It is well known that symplectic RK methods may be 
characterized as being those that exactly preserve all quadratic hrst integrals (invariants of 
motion) of the system being integrated. This is a useful property: for instance the (symplectic) 
implicit midpoint rule is sometimes chosen to integrate wave equations because it conserves 
quadratic invariants. However quadratic conservation has taken a back seat to the symplectic 
property itself in the geometric integration literature. The aim of this paper is to emphasize 
that the conservation of quadratic invariants plays an important role in the computation of 
numerical sensitivities, in optimal control theory and in classical mechanics. In all these areas 
there is an interplay between variational equations and their adjoints, an interplay based on the 
conservation of a key quadratic invariant (see (3.5)). The conservation of this invariant gives 
relevance to the symplecticness of the integrator. Actually, some widely used procedures, 
such as the direct method in optimal control theory and the computation of sensitivities via 
reverse accumulation, imply ‘hidden’ integrations with symplectic PRK schemes; therefore 
the theory of symplectic PRK integration should be helpful in understanding such procedures. 
From a more abstract point of view one may say that the purpose of this article is to clarify 
the behaviour of RK integrators vis-a-vis the operation of taking adjoints: an RK method is 
symplectic precisely if it commutes with the formation of adjoints. 

The paper presents a coherent treatment of results spread across the literature of various 
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communities together with some new, unifying results. In order to cater for a variety of 
possible readers, this article is written without assuming much background. We hope it will 
help researchers in optimal control to better understand RK schemes and, similarly, encourage 
RK experts to consider sensitivities and optimal control problems. 

Section 2 provides background on numerical integrators. We introduce the necessary no¬ 
tation and recall a number of properties of symplectic RK and related schemes. In particular, 
we quote some results (Theorems 2.1, 2.4) that ensure the exact preservation by the integrator 
of quadratic conservation laws. 

Section 3, the core of the paper, is devoted to the integration of the adjoint variational 
equations used to perform sensitivity analysis. It is well known that an RK method j\4 applied 
to the variational equations of a system <S automatically produces the variational equations for 
the discretisation of S by means of A4 (Theorem 3.2); in other words, the operation of RK 
discretisation commutes with the operation of forming variational equations. The situation 
for the adjoints is more complicated, cf. [37], because commutation will only take place if 
the discretisation is carried out so as to exactly conserve the key quadratic invariant (3.5) 
and, in some way, this demands a symplectic integrator. There are three cases of increasing 
complexity: 

• <S is integrated with a symplectic RK scheme M. Then the application of M to 
the adjoint equations of S produces the adjoint equations for discretisation of S by 
means of M. (Theorem 3.3). 

• iS is integrated with a non-symplectic RK scheme whose weights do not van¬ 
ish. Then, the adjoint equations for the discretisation are obtained by integrating 
the adjoint equations of S with a different set of RK coefficients, so that the overall 
procedure is a symplectic PRK method (Theorem 3.4). The recipe for the adjoint 
coefficients is given in formula (3.23) below. The method used for the adjoint equa¬ 
tions will in general be of lower order than the RK scheme A4 used for the main 
integration and will also have different stability properties. For these reasons non- 
symplectic methods A4 should be used with care. The computation of sensitivities 
of the discrete solution via automatic differentiation with reverse accumulation im¬ 
plicitly provides the symplectic PRK integration of the adjoint equations with coef¬ 
ficients (3.23) (Theorem 3.6). 

• 5 is integrated with a non-symplectic RK scheme Jvi having one or more null 
weights. Then, to obtain the adjoint equations of the discretisation, the continu¬ 
ous adjoint equations have to be integrated with a fancy integrator outside the RK 
class (see the appendix). Again an order reduction is likely to take place and again 
the fancy integration is implicitly performed whenever differentiation with reverse 
accumulation is used. 

Section 4 deals with the Mayer optimal control problem in the case of unconstrained 
controls. There is again a quadratic conservation law that is of crucial importance and this 
fact brings symplectic schemes to the foreground. The results there are quite similar to those 
in the preceding section (the case of vanishing weights is discussed in the appendix): 

• For a symplectic RK method, commutation [29] takes place : the discretisation of 
the continuous first order conditions necessary for optimality provides the first order 
necessary conditions for the discrete solution (Theorem 4.3). 

• When the equations for the states are discretised with a non-symplectic RK scheme 
with non-vanishing weights, to achieve commutation the costate equations have to 
be integrated by means of a clever set of coefficients that does not coincide with the 
set used for the states (Theorem 4.3). With this clever set, the overall integration 
(statesH-costates) is performed with a symplectic PRK method. In general, an order 
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reduction will take place for states, costates and controls. As first noted by Hager 
[17], the required set of coefficients is alternatively defined, not by imposing sym- 
plecticness of the integration, but by using the direct approach, i.e. by minimising 
the cost in the discrete realm with the help of Lagrange multipliers (Theorem 4.4). 

For a symplectic RK or PRK integration of the system for states and costates, the direct 
and indirect approach are mathematically equivalent. When a non-symplectic PRK is used in 
the indirect approach, the discrete solution cannot be reached via the direct approach, which 
always implies a symplectic integration of the states+costates system. 

Extensions to more general control problems are presented in Section 5. Section 6 is 
devoted to classical mechanics. Hamilton’s variational principle may of course be viewed as 
an optimal control problem: it is a matter of minimising a functional subject to differential 
constraints. As is well known, the application of the theory of optimal control to this situation 
replicates the standard procedure to obtain Hamilton’s canonical equations from Hamilton’s 
principle. In the discrete realm, this process provides the variational derivation of symplectic 
PRK integrators, originally due to Suris [40]. 

Section 7 relates the preceding material to the notions of reflection and transposition of 
RK coefficients introduced by Scherer and Tiirke [35] and Section 8 concludes. 

There is an appendix that deals with the problem of how to ‘supplement’ a given non- 
symplectic RK method with some vanishing weights so as to have a symplectic algorithm for 
partitioned systems. 

In order not to clutter the exposition with unwanted details, I shall not be concerned 
with technical issues such as existence of solutions of implicit integrators, smoothness re¬ 
quirements and so on. These may be very important in some circumstances (e.g. lack of 
smoothness poses difficulties if the controls are constrained, see [9]). 

To keep the length of this work within reasonable limits I shall not discuss some other in¬ 
teresting connections. The duality between the Fokker-Planck equations and the Kolmogorov 
Backward equations in the theory of Markov stochastic processes [12] provides another in¬ 
stance of the occurrence of adjoints; the material in this paper may be easily extended to study 
that situation. The paper [13] shows how the symplecticness of the integrator may be used 
to ensure symmetry-preserving simulations of the matrix Riccati equation in the feed-back 
representation of linear/quadratic optimal control problems. 

2. Numerical integrators. In this section we review some results on RK and related 
methods. For more details the reader is referred to [34], [5], [19], [21], [22]. 

2.1. Runge-Kutta schemes. An RK method with s stages is specified by + 2s num¬ 
bers 

(2.1) ciij , z, j — 1,..., s, Ci-^ i — 1,..., s. 

Given a D-dimensional differential system, F : x R —>• R-^, 

(2-2) ^y = F{y,t), 

to be studied in an interval, tg < t < to + T, and an initial condition 

(2.3) yito) = A e R^, 

the method (2. 1 ) finds approximations to the values y{tn), n = 0, 1 , ..., of the solution 
of (2.2)-(2.3), to < ti < ■ ■ ■ < tN = to + T,hy setting yo = A and, recursively, 

S 

(2.4) yn+l ~ Un ^ ^ 72 = 0, 1 . . . , 1. 

2=1 
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Here = t„+i — t„ denotes the step-length and i = 1,..., s, are the ‘slopes’ 

(2.5) Kn,i = F{Yn^i, tn + C^K) 

at the so-called internal stages j. The vectors Y^ i,..., Yn g are in turn defined by the 
relations 


( 2 . 6 ) 2 hji 'y ^ QijKjij j i — 1 , ..., 5 . 

i=i 

In the particular case where the matrix ( 0 ^) is, perhaps after renumbering the stages, strictly 
lower triangular (explicit RK methods), the stages are computed recursively from (2.5)-(2.6). 
In the general case, (2.5)-(2.6) provides, for each n, a system of coupled equations to be 
solved for the stages. 

The internal stages should not be confused with the values output by the integrator 
and may merely be regarded as auxiliary variables. Alternatively, the vector ^ is sometimes 
viewed as an approximation to the off-step value y{tn + Cihn). It is important to emphasise 
that the differences y{tn + Cihn) — Yn^i are typically much larger than the differences y{tn) — 

Vn- 

When the system (2.2) is autonomous, i.e. F = F{y), the Ci play no role. At the other 
end of the spectrum, if F is independent of y, the RK discretisation amounts to the use in the 
interval to < t < to + T of the composite quadrature rule based on the abscissas Ci and the 
weights bi- 

An RK scheme is said to possess order p if, for to < tn < to + T and smooth problems, 
lUn — y{tn)\ = 0{hP), where h = max„ The expansion of the local truncation error 
in powers of the step-length /i„ includes, for each power k = 1 , 2 ,..., one or several 
elementary differentials of F-, an integrator has order > p if and only if, in that expansion, 
the coefficients of the elementary differentials of orders k = 1,... ,p vanish. For instance, 
the relations (order conditions) 


(2.7) 


S S 

y ^ 62 — 1 , y ^ hidij 

i=l i,i=l 


y ^ biCLijCtjk — g, 


y ^ biCtijClik — 


ensure order at least 3 for autonomous problems. They correspond to the elementary differ¬ 
entials F (of order 1), {dyF)F (of order 2) and {dyF){dyF)F, {dyyF)[F, f] (both of order 
3) {dyF is the Jacobian matrix and dyyF the tensor of second derivatives). Since the work of 
Butcher in the early 1960’s, order conditions and elementary differentials are studied with the 
help of graphs. To impose order > p for autonomous problems, there is an independent order 
condition for each rooted tree with p or fewer vertices. Most, but not all, useful RK schemes 
satisfy Ci = for each i\ for them order p for autonomous problems implies order p for 

all problems. 

In general RK methods do not conserve exactly the quadratic first integrals of the system 
being integrated. The simplest illustration is afforded by the familiar Euler’s rule (s = 1, 
hi = 1, 011 = 0, Cl = 0) applied to the harmonic oscillator {D = 2) 




(superscripts denote components). The (quadratic) energy I = (l/2)((y^)^ -|- is con¬ 

served by the differential system because 



id 1 

yjt^ 


2 ^ 2 
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However for Euler’s rule it is trivial to check that, over one step, 

Hyh+i’vl+i) - + (ylf)^ 

with an energy increase. This lack of exact preservation takes place for all explicit RK inte¬ 
grators, even when their order p is high. On the other hand, it is well known and easy to prove 
that for the implicit midpoint rule (s = 1, &i = 1, an = 1/2, ci = 1/2) and the harmonic 
oscillator= I{yi,yl). 

The present paper is based on the following 1987 result of Cooper [8]. It ensures that 
some RK methods automatically inherit each quadratic conservation law possessed by the 
system being integrated. 

Theorem 2.1. Assume that the system (2.2) possesses a quadratic first integral I, i.e. 
I(-,-) is a real-valued bilinear mapping in xR^ such that, for each A and tg, the solution 
y{t) of(2.2)-(2.3) satisfies {d/dt)I(y(t),y{t)) = 0. The relations 

(2.8) hiOij -f — 0, 1, J — 1,..., s, 

guarantee that, for each RK trajectory {?/„} satisfying (2.4)-(2.6), I(yn, yn) is independent 
ofn. 

We shall not reproduce here the proof of this result; it is similar to that of Theorem 
2.4 below. The relations (2.8) are essentially necessary for an RK scheme to conserve each 
quadratic first integral of each differential system [19, Chapter VI, Theorems 7.6, 7.10]. 

In many applications the system (2.2) is Hamiltonian. This means that D is even and, 
after writing y = [q^, F = [f~^,g~^Y, with q,p,f,g & R"^, d = D/2, there exists a 
real-valued function F[(p, q, t) (the Hamiltonian) such that /’’ = dH/dp^, p’’ = —dH/dq'’, 
r = . ,d (superscripts indicate components). Hamiltonian systems are characterised ge¬ 

ometrically by the symplectic property of the corresponding solution flow [2]. When d = 1, 
symplecticness means conservation of oriented area; in higher dimensions a similar but more 
complicated interpretation, based on differential forms, exists; such interpretation is not re¬ 
quired to read this paper. It is often advisable [34], [19], [25] to integrate Hamiltonian prob¬ 
lems by means of so-called symplectic algorithms, i.e. algorithms such that the transformation 
yn I— yn+i in R^'^ is symplectic; those algorithms are particularly advisable in integrations 
where the interval fo < f < fo + T is long (for a recent reference in that connection, see [11], 
which is part of a project to integrate the solar system over a 60 million year interval). Us¬ 
ing the method of modified equations [16], each numerical solution may (approximately) be 
interpreted as a true solution of a nearby differential system called the modified system. For 
symplectic methods applied to Hamiltonian systems, the modified system is Hamiltonian; for 
non-symplectic discretisations, the modified system, while perhaps close to the system being 
integrated, is not Hamiltonian and this fact is likely to imply a substantial distortion of the 
long-time dynamics [34], [19]. 

The first symplectic integrators were constructed in an ad hoc way; it was later discovered 
(independently by Lasagni [24], Suris [39] and the present author [31]) that the class of RK 
methods contains many symplectic schemes: 

Theorem 2.2. Assume that the system (2.2) is Hamiltonian. The relations (2.8) guar¬ 
antee that the mapping ?/„ i—> y„_|_i defined in (2.4)-(2.6) is symplectic. 

The proof of Theorem 2.2, not included here, is very similar to the proof of Theorem 
2.1. Just as for the conservation of quadratic first integrals, it turns out, see [34], Section 6.5, 
that the relations (2.8) are essentially necessary for i— yn+i to be symplectic for each 
Hamiltonian system. 

The set of relations (2.8) thus ensures two different properties: quadratic conservation 
and symplecticness. These two properties are not unrelated: symplecticness may be viewed 
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a consequence of the quadratic conservation because, as noted in [3], the preservation of the 
symplectic structure by a Hamiltonian solution flow may be interpreted as a bilinear first 
integral of the solution flow of the associated variational system. 

The symplectic character of RK schemes satisfying (2.8) has attracted much attention in 
view of the importance of Hamiltonian systems in the applications. On the other hand, it is 
fair to say that quadratic conservation has been to some extent played down in the geometric 
integration literature. For this reason, while schemes satisfying (2.8) could have been called 
conservative, the following terminology is standard; 

Definition 2.3. The RK scheme (2.1) is called symplectic (or canonical) if (2.8) holds. 

Our focus in this paper is on symplectic schemes in as far as they conserve quadratic 
invariants, as these are actually crucial in several applications. The discussion of any possible 
benefits derived from the symplectic character of the map Un+i, including the existence 
of modified Hamiltonian systems, are out of our scope here. The paper [7] is, in this sense, 
complementary to the present work. 

It was proved in [33] that the relations (2.8) act as simplifying assumptions vis-a-vis the 
order conditions: once these relations are imposed, the order conditions corresponding to the 
different elementary differentials/rooted trees are no longer independent. For instance, it is 
a simple exercise to show that, when (2.8) holds, the second order condition in (2.7) is a 
consequence of the first and therefore symplectic RK schemes of order > 1 automatically 
possess order > 2. Similarly the last order condition in (2.7) is a consequence of the first 
three. In this way, for a general RK methods to have order > 3 for autonomous problems, 
there are 4 order conditions; for symplectic methods the number is only 2. For a symplectic 
RK method to have order > p for autonomous problems there is an order condition for each 
so-called non-superfluous free tree with < p vertices. 

There are many symplectic RK methods [34] including the Gauss methods (of maximal 
order 2s and positive weights) as first shown in [31]; however no symplectic RK scheme is 
explicit. The simplest Gauss method (s = 1) is the familiar implicit midpoint rule. 

2.2. Partitioned Runge-Kutta schemes. In some applications the components of the 
vector y in (2.2) appear partitioned into two blocks: y = q € p € 

Hamiltonian problems, where d = D/2, provide an example, as we have just seen. In those 
cases it may make sense to use a set of coefficients (2.1) for the integration of the block q and 
a second set 




(2.9) 


for the integration of the block p. (There is no loss of generality in assuming that the number 
of stages s in (2.9) coincides with that in (2.1); see [34] Remark 3.2.) The overall method is 
called a PRK scheme. A more precise description follows. 

Denote by F = [f^, f € g € the partitioning of F induced by the 

partitioning \q^,p^Y of y, so that (2.2) reads 



then the equations for the step n —n + 1 of the PRK method (2.1), (2.9) are 


S 


S 


( 2 . 11 ) 



where 



( 2 . 12 ) 
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and the internal stages Qn,i, Pn,i, * = 1,..., s, are defined by the relations 


S S 



PRK methods are not a mathematical nicety: the Verlet algorithm, the method of choice 
in molecular dynamics [36] is one of them. In its so-called velocity form, the algorithm is 
written in the molecular dynamics literature as (it is a simple matter to rewrite the algorithm 
in the format (2.11)-(2.13)): 


Pn+l/2 —Pn P 2 

Qn+1 — Qn “t” hjiJVI Pn+l/2; 

Pn+1 = Pn+l/2 + -^giQn+l,tn+l)- 


Here the vectors p, q and g contain respectively the momenta, positions and forces and M 
is the diagonal matrix of the masses. Note the way the q and p variables are advanced in 
different ways. 

Clearly an RK scheme may be regarded as a particular instance of a PRK method where 
the two sets (2.1), (2.9) happen to coincide. For PRK methods to possess order > p for 
autonomous problems, there is an order condition associated with each bicolour rooted tree 
with p or less vertices (see e.g. [19, Chapter 111]). For order > 2 the order conditions are: 



(2.14) 



(2.15) 


they correspond to the elementary differentials /, g, (dxf)f, {dxf)g, {dxg)f, {9xg)g respec¬ 
tively. It will be important later to note that, if the PRK (2.1), (2.9) has order p, then the RK 
scheme with coefficients (2.1) and the RK scheme with coefficients (2.9) have both order p. 
The converse is not true: if (2.1) and (2.9) are the coefficients of two RK schemes of order 
p, then the combined PRK scheme may have order < p. This is plain in (2.15), where the 
second and third relations are necessary for the PRK to have order > 2 but are obviously not 
required for (2.1) and (2.9) to be the coefficients of two different RK schemes of order > 2. 

For PRK methods, the result corresponding to Theorem 2.1 is (cf. [19, Chapter IV, 
Theorem 2.4], where only the autonomous case is envisaged): 

Theorem 2.4. Assume that S{-, •) is a real-valued bilinear map in x such 

that, for each to and A, the solution y{f) = [q{f)^ ,p{f)^Y of (2.3), (2.10), satisfies 


^S{q{t),p{t)) = 0. 


The relations 


(2.16) bi Flj, 2 1,..., s, h^Aij -t- Bjciji hj^Bj 0, 1,..., s, 


and 


(2.17) 


Cj — Ci ; ^ 1; * ■ 
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guarantee that, for each PRK trajectory satisfying (2.11)-(2.13), S{qmPn) is independent of 
n. 

As in the case of RK methods, the condition in the theorem is necessary for conservation 
to hold for all S and all partitioned differential systems, see [19, Chapter VI, Theorems 7.6, 
7.10]. In the particular case of autonomous problems the abscissas play no role. Thus, to 
achieve conservation, it is not necessary to impose the condition (2.17) whenever / and g 
are independent of t. Note that the theorem only applies to a quadratic function of the form 
S{q,p) which is not the most general possible; for instance the inner product q~^q is not 
included in that format. 

Before proving the theorem we present a simple algebraic auxiliary result that will be 
used repeatedly later in other contexts. 

Lemma 2.5. Let qn, Pn, Qt, Pi, kn,i, ^n,i be arbitrary vectors satisfying (2.11) and 
(2.13). If S is bilinear and (2.16) holds, then 

(2.18) , p^-|_i) S(qji,pn) — hn'^hi[s(kn,i,Pn,i) + S(Qn,iAn,i)^ ■ 


Proof. Since S is bilinear, we may write from (2.11) 

^ (^n+l; Pn+l) S , ‘Pn) — ^ ^ i Pn') hn 

^ 3 

Now use (2.13) to eliminate and pn from the right-hand side: 

^ (.Qn-\-l 5 Pn-\-l') S {^Qn i Pn) — ^ ^ ; Pn,i ^ ^ ) 

^ 3 

hfi ^ ^ BjS(^Qn^j ^ ^ i ^n,j ) 

3 ^ 

+ hl'^biBjS{kn,i,in,j)- 

ij 

In view of the bilinearity and (2.16), the proof is complete. □ 

Proof of the theorem: Conservation of S implies that 

S{f{q,P,t),p) + S{q,g{q,p,t)) = 0, 
because, along each solution q{t), p{t), 

S{^q{t),p{t)) +S{q{t), ^Pit)) = j^S(q(f),p(f)) = 0. 

Therefore (2.12) and (2.17) entail that the right-hand side of (2.18) vanishes. □ 

For the preservation of the symplectic structure, the result (derived in [40] and [1] inde¬ 
pendently) is: 

Theorem 2.6. Assume that the system (2.10) is Hamiltonian. The relations (2.16)- 
(2.17) guarantee that the mapping (q-mPn) (7n-i-i,Pn-i-i) defined in (2.11)-(2.13) is sym¬ 
plectic. 
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The conditions (2.16)-(2.17) are essentially necessary for symplecticness [34] and hence 
the following definition: 

Definition 2.7. The PRK scheme (2.1), (2.9) is called symplectic if (2.16)-(2.17) 
hold. 

If the PRK is symplectic, there is a reduction in the number of independent order con¬ 
ditions; the classes of equivalent order conditions were first described by Hairer [18]. An 
alternative treatment (see [27]) based on so-called H-trees was given by Murua in his 1995 
thesis, cf. [4]. For instance, for a symplectic PRK method to have order > 4 it is necessary to 
impose 13 order conditions: for general PRK methods that number is 36. 

3. Variational systems and their adjoints. We now explore the role of symplectic RK 
schemes when integrating adjoint variational systems. A comprehensive discussion of the use 
of adjoints to determine sensitivities is not within our scope here. The paper [14] provides 
a general introduction, together with applications to aerodynamics. Applications of adjoints 
to atmospheric models are discussed in [30]. Of course the idea of an adjoint problem is 
not restricted to differential equations; see [6] for an early paper describing a very general 
framework. 


3.1. The continuous prohlem: quadratic conservation. We now present the math¬ 
ematical foundations of the remainder of the paper. Consider a d-dimensional differential 
system 


(3.1) 





and denote by a S the corresponding initial value and by x{t) the solution that arises 
from the perturbed initial condition x{to) = a + r]. Linearisation of (3.1) around x(t) shows 
that, as \r]\ —>■ 0, x{t) = x{t) + 6{t) + odiyl), where 5 solves the (linear) variational system 
(see e.g. [21] Section 1.14) 


(3.2) = da:f{x(t),t)S, 

(dxf is the Jacobian matrix of / with respect to x). Thus, when x{t) is known, solving for 
S{tQ -b T) the initial-value problem given by (3.2) and (5(fo) = V yields an estimate for the 
change in solution x{t) — x(f); see a simple example in Fig. 1. 

The adjoint system of (3.2) is given by 


(3.3) ^X =-dxf{x{t),t)'^ X. 

(To avoid confusion, variables in this paper are always column vectors', from a mathematical 
point of view it would have been better to write sensitivities, Lagrange multipliers and mo¬ 
menta as row vectors, as they belong to the dual space of the space of states.) The right-hand 
side in (3.3) has been chosen in such a way that the following proposition is valid. More 
precisely, it is best to think that the adjoint is the system for which the conserx’ation property 
(3.5) below holds. 

Proposition 3.1. For each x, S, X G and real t: 


( - dxf(x,t)'^ xY6 + X^dxf(x,t)5 = 0 . 


Therefore if 5(f) and X(f) are arbitrary solutions of (3.2), (3.3) respectively, then 


|A(,m) = (4A(,))b(,) + A(,T(4i(t))^0 


(3.4) 
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and accordingly 

(3.5) X{to + TV5{to + T) = 


Why is the adjoint system useful? Regard 77 as a parameter and assume that we are 
interested in finding ui ^+ T) for fixed w G i.e. in estimating, at the final time fg + T, 
the change along the direction of w of the solution of (3.1) induced by the initial perturbation 
a a + rj. (For instance choosing w equal to the ?’-th co-ordinate vector would correspond to 
estimating the change in the r-th component of the solution.) When x{t) is known, we solve 
(3.3) with the final condition A(fo + T) = to and note that the quantity we seek coincides 
with A(fo )^?7 because, from the proposition, 

u!~^S{to +T) = A(fo + T)^ S(to + T) = A(fo)^<^(fo) = X{tQ)~^r]. 

The advantage of this procedure is that, as 77 varies, the computation of A(fo )^»7 requires 
only one integration of (3.3); the computation of w^(5(to + T) via (3.2) would need a fresh 
integration for each new choice of 77 (see Fig. 1). 

As an application, consider the task of computing the gradient, VaC(x(fo + T)), of a 
real-valued function C with respect to the initial data a. We set to = VxC{x{to +T)) in the 
preceding construction and successively let the r-th coordinate vector, r = 1,... ,d, play the 
role of 77 to conclude that the gradient sought has the value A(to) where X{t) is the solution of 
the adjoint system with final condition A(fo +T) = VxC{x{tQ + T)). Only one integration 
is required to find d derivatives djdd!^. The adjoint system (3.3) ‘pulls back’ gradients with 
respect to x{to -f T) into gradients with respect to x(fo). 

3.2. The continuous problem: Lagrange multipliers. We shall also need an alterna¬ 
tive derivation of the recipe \/aC{x{tQ + T)) = A(fo) just found. Since the use of Lagrange 
multipliers (see e.g. [14, Section 2.5]) in this connection (as distinct from their use in min¬ 
imisation) may not be known to some readers, we give full details. Define the Lagrangian 
functional C = C{a, x, Ag, A) 


C = 


C{x{to + T)) - AJ {x{to) - a) - j 

Jt, 




HtV- f{x{t),t 


dt. 


where, a, Ag are arbitrary vectors, x, X arbitrary functions. A key point here is that, whenever 
i is a solution of (3.1) and x{to) = a, the value of C{a, x, Ag, A) coincides with C(i(fg-|-T)). 
If 77 and S are the variations of d and x respectively, the variation SC of the functional is 

SC = VxC{x{to + T))'^S{to + T) - Aj {S{to) - 77 ) 

pto+T ^ ^ , 

- Jt HtV - dxf{x{t),t)S{t)j dt, 

so that, after integration by parts, 

SC = {VxC{x{to + T)) - X{to + T)ySito +T) + X{toVr] 

+ (A(fg) — Ag) S{to) 

+ J (^J^X{t)'^S{t) + X{t)'^dxf{x{t),t)S{t)^dt. 
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4 6 8 10 12 14 16 18 


PREY 


Fig. 1. Two-species Lotka-Volterra system dx^/dt = — 0.2x^x^, dx^/dt = —2x^ + Q.2x^x^ 

(superscripts indicate components of vectors); x^ and x^ represent, in suitable units, numbers of preys 
and predators respectively. The solid lines give, for 0 < t < 1, the unperturbed solution x(t) with initial 
condition a;(0) = (15,10) and a perturbed solution x{f) with a;(0) = a;(0) +17 = (16, lO).' an increase 
in the number of preys at t = 0 leads at t = 1 to a decrease in the number of preys and to an increase 
in the number of predators. The stars are the points x{t) + S{t), 1 = 0,0.05, 0.10,..., where 5 solves 
the variational system; they almost coincide with the corresponding values of the perturbed solution 
x(t). In particular, the change in the number of preys, 2 :^( 1 ) — a;^(l), is very well approximated by 
(5^(1) = —0.1786 ..., i.e. by the inner product a;^5(l), where to denotes the first co-ordinate vector 
(1,0) = The variational equations move 77 = S{0) forward to <5(1). The dots show how the 

adjoint equations move to — A(l) backward to yield A(0) = V 3 ;(o)X^(l), the gradient of x^ as a 
function ofx{0). The inner product ti;^(5(l) exactly coincides with \{0)^r]. In a Lotka-Volterra system 
with d species, a single integration of the adjoint system is necessary to find the d-dimensional gradient 
of x^{l) as a function ofx{0). 


We now make choices Aq, A (depending on a and x) for the (so far arbitrary) multipliers Aq, 
A. We define A as the solution of the equation (3.3) (with x{t) in lieu of x{t)) subject to the 
final condition A(fo + T) = VxC{x{to + T)) and set Aq = A(fo)- These choices ensure 
that, at a, x, the intermediate variation 5{t) does not contribute to 5C', we then have (at d, 
x) 6C = A(lo)^?? or, in other words, A(lo) is the gradient of £ as a function of a. Since, as 
pointed out above, if x solves (3.1) and x{tf) = a, then £(d, x, Aq, A) = C{x{tQ + T)), we 
conclude that A(fo) = ^aC{x{to + T)) as we wished to prove. The original system (3.1) and 
the initial condition may also be retrieved from the Lagrangian by making zero the variations 
with respect to A and Aq respectively. 

The same approach may also be used if we wish to make things more involved and 
introduce the velocity {d/dt)x = fc as a new argument in the Lagrangian. To simplify the 
notation we shall hereafter drop all hats, so that the same symbols a, x, ... will be used for 
the arbitrary arguments of the Lagrangian (that previously were written as a, x, ...) and for 
the corresponding values at the solution sought. When the velocity is considered as a new 
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argument, the Lagrangian becomes 


(3.6) 


£ = C{x{to + T)) - AJ {x{to) - a) 

rto+T 1 


I 


io 

(‘tQ-\-T 


'to 


A{ty(k{t) - f{x{t),t)'j dt. 


Taking variations and choosing the multipliers to cancel the undesired contributions to 6C, 
leads to the relations A(io) = S/aC{x{to + T)), A(io + T) = VxC{x{to + T)), Aq = A(io) 
found above and, additionally, to A{t) = X{t) (as expected). 

3.3. The discrete problem: RK integration. Let us suppose that (3.1) has been discre- 
tised by means of the RK scheme (2.1) to get, n = 0,..., — 1, 


(3.7) Xn-i-i — Xji T hn ^ ^ 

z=l 

(3.8) — f 5 tn T ^i^n ); ^ 1 7 . * . ; 

s 

(3.9) A^n,i — “b ^ ^ ^ ^ — 1; • . ■ 5 

f=l 

and that, in analogy with the preceding material, we wish to estimate the impact on x at of 
a perturbation of the initial condition xg = a. Linearisation of the RK equations (3.7)- 
(3.9) around x„, Xn,i shows that the perturbed RK solution x„, n = 0,... ,N, satisfies 
X„ = Xn + Sn + 0 (|p|) with 


(3.10) Ayi+i — Sn T hji ^ ^ bidn^ij 

i=l 

(3.11) dji^i — dxf “b Cj/tyj) 2 ; ^ — 1; . . . ; 

S 

(3.12) 2 — Sji “b hji ^ ^ Qijdji^j , i — 1,..., s 

(the vectors dn,i and are the variations in the slopes ^ and stages ^ respectively). 

On the other hand, if we regard the given differential equations (3.1) together with the 
variational equations (3.2) as a 2(i-dimensional system for the vector y = [x^ , and apply 
the RK scheme as in (2.4)-(2.6), we also arrive at (3.7)-(3.12). We have thus proved, as in, 
say, [19, Chapter VI, Lemma 4.1]: 

Theorem 3.2. The process of RK discretisation commutes with forming variational 
equations: the RK discretisation of the continuous variational equations (3.1)-(3.2) yields 
the variational equations (3.7)-(3.12) for the RK discretisation. 

The situation for the adjoint equations is not quite as neat (cf. [37]). In order to find the 
discrete sensitivity uj^6m we would like to numerically integrate (3.3) with final condition 
Xn = O' in such a way that (cf. (3.5)) 

(3.13) A^<5^ = Ajbo. 

Although in actual computation the approximations A„ are to be found without using the 
equations (3.10)-(3.12) for (this is the whole point behind the use of adjoints), let us 
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h AJ?7 Ajry — A(0)^?7 — w^i5(1) 

0.100 -0.1070 -0.2497 01)717 -0.0710 

0.050 -0.1401 -0.2135 0.0385 -0.0348 

0.025 -0.1588 -0.1959 0.0199 -0.0172 

Table 1 

Euler integration on a uniform grid of the x, 5, A equations for the Lotka-Volterra problem in Fig. 1. 
The lack of symplecticness of the integrator results in XItj being different from uy the discretisation 

of the adjoint equations does not provide the adjoint of the discretisation. The convergence of the 
integrator implies that, as the grid is refined, Ajiy and ui 5 n are 0(h) away from their common limit 
A(0)^77 = a;^(5(l) ~ —0.1786, as borne out by the last two columns. When, alternatively, the A 
equations are integrated with the Radau method (3.24) the numerical results for XJiTj coincide with 
those displayed in the third column of the table. 


consider for a moment the 3d-dimensional system (3.1)-(3.3) for the extended vector y = 
(5^, A^]^. Then the condition (3.13) demands that we integrate this large system in such 
a way as to exactly preserve the invariant I{y{t),y{t)) = A(f)^5(f) in (3.4). According to 
Theorem 2.1, we may achieve this goal by using the RK scheme (2.1) provided that it is 
symplectic. This results in the relations (3.7)-(3.12) in tandem with (n = 0,..., iV — 1): 


(3.14) 

(3.15) 

(3.16) 


An+l — “f ^ ^ 

i=l 

^n,i — ^xfi,^n,i:tn 1“ ttihyi') Z — 1, 

s 

^n,i — Ati -f ^ ^ i — 1, . . . , 5. 

i=i 




Let us summarise the preceding discussion: 

Theorem 3.3. Assume that the id-dimensional system (3.1)-(3.3) is discretised by a 
symplectic RK scheme (2.1). Then for any RK solution (3.13) holds. In particular, for the RK 
solution specified by the initial condition Xq = a, Sq = rj together with the final condition 
Ajv = w, 

= Ajzy. 


For a non-symplectic RK scheme of order p, ui^Sn and Ajp are approximations of order 
p to their continuous counterparts u;^(5(fo + T) and A(fo)^?? respectively and therefore Ajzy 
will be a 0{hP) approximation to the true sensitivity uj~^6n of the discrete solution. See the 
example in Table 1 where the Euler integrator was chosen so as to have large errors and see 
clearly the difference between uj~^6n and Ajp. 

In practice, the variational equations (3.2) do not need to be integrated. We successively 
find xq, xi, ..., xn via (3.7)-(3.9) and, once these are available, we set Aat = w, and compute 
Aat-i, ..., Aq from (3.14)-(3.16) taken in the order n = N—l,N — 2,...,0. For this reason, 
it may be advisable to rewrite (3.14)-(3.16) in the following ‘reflected’ form (see Section 7) 
that emphasises that the approximation A„ at is to be found from the approximation A„+i 
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ln+1 ■ 


(3.17) 

s 

“1“ ( ^n) ^ ^ 

i—1 

(3.18) 

^n,i — ^xf “1“ (1 ^n)) ^ — 1 

(3.19) 

^n,i — -^n+l “1“ ( ^ ^ ^ J ^ — 1, . . . , 5. 


i=i 


In analogy to the continuous case, for a symplectic RK discretisation, Va.C(xN) may 
be computed by finding Aq from the recursion (3.14)-(3.16) (or (3.17)-(3.19)) with Aat = 

S/xC{xn)- 

3.4. The discrete problem: PRK integration. Theorem 3.3 may be generalised easily 
with the help of Theorem 2.4. Hereafter it is understood that when using the PRK scheme 
the X, S equations are integrated with the set of coefficients (2.1) (so that the (5„ are exactly 
the variations in Xn) and the A equations with the set of coefficients (2.9). In other words, 
the system is partitioned as q = p = A.' This approach leads to (3.7)-(3.12) 

supplemented by the relations obtained by replacing the lower case coefficients , bi, Ci in 


(3.14)-(3.16) by their upper case counterparts: 


(3.20) 

S 

-^n+l — ^ ^ 

i—1 


(3.21) 

^n,i — ^xf ^ — 1 ; ■ ■ 

. ,s 

(3.22) 

-^n,i — hn ^ ^ -^ij ^n,j i ^ — 1 , . . . , 5. 



j=i 


The generalisation of Theorem 3.3 is: 

Theorem 3.4. Assume that the 3d-dimensional system (3.1)-(3.3) is discretised by a 
symplectic PRK scheme (2.1), (2.9). Then (3.13) holds for any PRK solution. In particular, 
for the PRK solution specified by the initial condition xq = cx, Sq = rj together with the final 
condition Xff = uj, 

= AJ?7. 


Once more, for a symplectic PRK discretisation, the gradient VaC{xN) coincides with 
Ao if Xn = ^xC{xn)- For a non-symplectic discretisation of the adjoint equations, Aq is 
a only an approximation to VaC(xN)- For this reason non-symplectic PRK discretisations 
cannot be implied by the direct differentiation procedure described in Section 3.5. 

How do we compute exactly (i.e. up to round-off) the sensitivity 5n with the help 
of the adjoint system when the x integration has been performed with a non-symplectic RK 
scheme (2.1) and Theorem 3.3 cannot be invoked? Theorem 3.4 suggests the way. For sim¬ 
plicity we only look at the case where in (2.1) none of the weights bi, i = 1 ,... , s, vanishes 
(for the general situation see the appendix). From the coefficients in (2.1) we compute a new 
set 

(3.23) Aji = bi — bittij/bj, i, j = 1,..., s, Bi = bi, Ci = Ci i = 1,..., s. 

'a variation on this theme is presented in [28, Section 6] in the context of optimal control problem. There the x 
equations are themselves partitioned and integrated by means of a symplectic PRK. 
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In view of (2.16)-(2.17), we now have a PRK scheme for the discretisation of (3.1)-(3.3) and 
Theorem 3.4 applies. If (2.1) is explicit, the computations required to descend from A^r to Aq 
are also explicit. Here is the simplest example. Assume that the x equations are integrated 
with the explicit Euler rule (s = 1, an = 0, 6 i = 1, ci = 0). With that choice, i = Xn 
and 

^n+l — T 

The trick just described yields An = 1, Bi = 1, Ci = 0. Accordingly, the stage A„_i 
coincides with A„+i and using (2.11) we see that the required A integrator is: 

(3.24) Atj+I — Xn hnOxf A 7 J+I. 

Obviously this is not the explicit Euler rule, because A in the right-hand side appears at time 
tn+i- And, unless the problem is autonomous, it is not the implicit Euler rule either because 
t is evaluated at the retarded time (Eor RK enthusiasts only: the coefficients An = 1, 
i3i = 1, Cl = 0 correspond to the Radau lA method of one stage introduced by Ehle, [22, 
Section IV.5].) 

In the particular situation where the x integration has been performed by a symplectic 
RK method (symplectic RK methods possess non-vanishing weights [34], Section 8.2), the 
recipe (3.23) will lead to Aij = a^ and the resulting PRK method will coincide with the 
original RK method. In the general case, for (3.13) to hold, the adjoint equations for A have 
to be integrated with coefficients different from those used for the original equations for x. 

There are hidden difficulties with the use of this recipe. When stability is an issue, as 
in stiff problems or time-discretisations of partial differential equations, it is necessary to 
investigate carefully the stability behaviour of the A integration [37]. On the other hand, and 
as noted before, the order of accuracy of the overall PRK, x, A, integrator may be lower 
than the order of the RK method (2.1) for x we started with. When investigating the order 
of the overall PRK method we have to take into account that the right-hand side of (3.1) is 
independent of A and the right-hand side of (3.3) is linear in A. These features imply that many 
elementary differentials vanish and that accordingly it is not necessary to impose the order 
conditions associated with them. Eurthermore we have to take into account the reduction in 
the number of independent order conditions implied by symplecticness. 

3.5. The discrete problem: automatic differentiation. According to the preceding 
discussion, for any RK integration of (3.1) with nonzero weights, it is possible to find the 
gradient VaC{xN) by means of an integration of the adjoint equations with the coefficients 
(3.23). It is however clear that it is also perfectly possible to compute VaC{xN) by repeatedly 
using the chain rule in (3.7)-(3.9), something that we shall perform presently. Since C is scalar 
and a G R'^, where d is possibly large, reverse accumulation [15]^ is to be preferred and this 
may be performed with the help of Lagrange multipliers as in Section 3.2. 

We shall need the following auxiliary result: 


^Recall that the idea of reverse accumulation is as follows. Imagine an application of the chain rule that leads 
to a product J3J2J1, where J3 is the Jacobian matrix d{z)/d{y) of the final variables z with respect to some 
intermediate variables y and similarly J 2 = d{y)/d{x), Ji = d{x)/d{w) (w are the independent variables). 
When the dimension of z is much lower than the dimensions of x, y and w, computing the ‘short’ (few rows) 
matrices K = J3J2 and KJi (reverse accumulation) is much cheaper than first forming the ‘tall’ (many rows) 
matrix L = J2J1 and then J3L (forward accumulation). The forward order J3(J2 Ji) finds successively fhe 
Jacobians Ji = d{x)/d{w), J2J1 = d(y)/d{w) and J3J2J1 = d{z)/d{w). In reverse mode, the intermediate 
Jacobians are J3 = d{z)/d{y), J3J2 = d{z)/d{x), J3J2J1 = d{z)/d{w). The analogy with the 5 and A 
equations is manifest. 
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Lemma 3.5. Suppose that the mapping O : —>■ is such that the Jacobian 

matrix is invertible at a point (aoi 7o) S x , so that in the neighborhood ofag, 
the equation fl(a,j) = 0 defines 7 as a function of a. Consider a real-valued function in 
of the form ijj{a) = ^'(a, 7 ( 0 ;)), for some 'I' : —>• R. There exists a unique vector 

Aq £ R'^ such that (superscripts denote components): 


d' 

^af^lao ^a^l(QOi7o) l(ao,7o)i 

r—1 

d' 

0 = ^7^|(ao,7o) A' 'y ^ AQV.yn l(c(o,7o)- 
r—1 

Proof. The second requirement may be rewritten as 
(3.25) (5^fl)'^Ao = 

with the matrix and right-hand side evaluated at ao, 70 . This is a linear system that uniquely 
defines Aq. To check that the vector Aq we have just found satisfies the first requirement, we 
use the chain rule 


da'^\a ^ct^l(a,7(a)) “f ^7^^ I (a,7(a))^a7l a 5 

differentiate Cl(a, 'y{a.)) = 0 to get 


^aAl|(a^ 7 (a)) T 0: 

evaluate at ao, and eliminate da'y\ao- D 

It is useful to rephrase the lemma by introducing the Lagrangian 

£(a, 7 , A) = T'(a, 7 ) + A^n(a, 7 ). 

so that the relation n(ao,7o) = 0 and the equation (3.25) that defines the multiplier are 
respectively 


V^£(^1, 7 ; I (qq ,70, Ao) ^5 7 ;'^) I (cKO ,70, Aq) 

while the gradient we seek is computed as 

a'4’\aQ 7; I (ao,7o,Ao) ■ 

Note that these developments mimic the material in Section 3.2, with 7 playing the part of x, 
7 o the part of x, etc. 

In numerical differentiation, ip is the function whose gradient is to be evaluated, the 
components of a are the independent variables, and the components of 7 represent interme¬ 
diate stages towards the computation of ip. (For instance, in the simple case (d = 1) where 
ipia) = ay^l -I- aexp(a) cos(exp(a)), we may set the constraints 12^ = 7 ^ — exp(a) = 0, 
= 7 ^ — cos( 7 ^) = 0, = 7 ^ — 07 ^ 7 ^ = 0, = 7 ^ — \/l 7 ^, Ip = a 7 "‘.) The 

interpretation of the 7 ’' as successive stages implies that, in practice, fl will possess a lower 
triangular structure: 12’’ will only involve 7 ^,... , 7 ’’. The evaluation of ip successively finds 
the numerical values of 7 ^,... , 7 ^^ in a forward fashion. The numerical values of the com¬ 
ponents Aq, are then found by backward substitution in the upper-triangular linear system 
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(3.25) and finally the lemma yields the required value of the gradient. If and have been 
judiciously chosen, then the mappings Vq.'I', Vaff'', required to compute the 

gradient will have simple analytic expressions, easily derived by a human or by a computer 
programme. 

We now apply this technique to find VaCjtCAr). In (3.7)-(3.9) we let (the components of) 
Xn, n = 0,N, and i, n = 0,..., iV — 1, z = 1,..., s, play the role of (the components 
of) 7 and introduce the Lagrangian 


N-l 


C{xn^ Aq (xq Ol) ^ ^ ^nAyj_j_2^ (Xyi+l ^n) ^ ^ 

n ^ ■ - 

n—0 

N-l s 

(3.26) ^ ^ ^ ^ f 


n—0 2=1 


where we understand that the stage vectors Xn,i have been expressed in terms of the Xn and 
kn^i by means of (3.9). Clearly this discrete Lagrangian is the natural RK approximation to 
(3.6). 

A straightforward application of Lemma 3.5 now directly yields the following result, 
where we note that the hypothesis 6^ ^ 0, z = 1,..., s, is natural because, when, say, bi = 0, 
the Lagrangian (3.26) does not incorporate the constraint i = /(Ar„ i, + ci/z„). (The 
case of zero weights is considered in the appendix.) 

Theorem 3.6. Consider the RK equations (3.7)—(3.9), with bi ^ 0, i = 1,..., s. The 
computation of'VaC{x]\[) based on the use of Lemma 3.5 with Lagrangian (3.26) leads to 
the relations (3.20)-(3.22), with the coefficients Aij, Bi, Ci given by (3.23), together with 

Va;C(x7v) — AtVj ^aC{X]^) — Aq. 

Note that, in the situation of the theorem, Aat, \n-i, ^n- 2 , ...successively yield the 
gradients \/xnC{xn), Vxn-iC{xn), ^XN- 2 ^ixN), ■■■ It is well known that the reverse 
mode of differentiation implies an integration of the adjoint equations. The theorem shows 
additionally that, for an RK computation of x, the implied adjoint equation integration is such 
that the x, A system is discretised with a symplectic PRK method. Recall that we showed in 
the preceding subsection that nonsymplectic PRK cannot appear in this setting as they do not 
find exactly VaCjxAr). In a way the chain rule provided us with symplectic integration before 
the latter was invented. 

A further remark: the use of the chain rule with forward accumulation implies an RK 
integration of the variational equations (3.2) with the original RK coefficients (2.1). In agree¬ 
ment with a previous discussion, the forward mode is more expensive; each partial derivative 
djddJ', r = 1,..., d, in the gradient requires a separate integration. 

4. A simple optimal control problem. We explore next the role of symplectic methods 
when integrating the differential equations that arise in some optimal control problems [38], 
[41], [42]. In this section we look at the simplest case, where the developments are very 
similar to those just considered; more general problems are treated in the next. 

4.1. The continuous problem. Consider now the d-dimensional system 
(4.1) ^x = f(^x,u,t), 

where x is the state vector and u a j/-dimensional vector of controls. Our aim is to find 
functions x{t) and u{t), subject to (4.1) and the initial condition x{tQ) = a G so as to 
minimise a given cost function C{x{to + T)). 
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The variational equation is (cf. (3.2)) 

(4.2) = d^f{x{t), u{t),t) 6 + dufix{t),u{t), t) C, 
at 

where du is the Jacobian matrix of / with respect to u and C denotes the variation in u, see 
e.g. [38, Section 2.8], [41, Section 5.1]. Now 6{to) = 0, as x^to) remains nailed down at a. 
An adjoint system (cf. (3.3)) 

(4.3) = -da;fix{t),u(t), t)'^ A, 
and constraints 

(4.4) dufix{t),u{t), tyX{t) = 0, 

are introduced, see e.g. [38, Section 9.2]. As was the case with the adjoint in (3.3), the actual 
form of these equations is chosen to ensure the validity of the conservation property (3.5). 
More precisely we have the following result; 

Proposition 4.1. For each choice of vectors x, u, 5, (^, A and real t: 

(4.5) (^-dxf{x,u,t)'^ Xj 6 + X'^(^dj;f{x,u,t)S + duf{x,u)C^=0. 

Therefore if 5{f), X{t), C(f) satisfy (4.2)—(4.4), then (3.4)—(3.5) hold. 

The use of the proposition is as follows. We solve the two-point boundary problem given 
by the statesH-costates system (4.1), (4.3)-(4.4) with initial/final conditions 

(4.6) x{to) = cn, X{tQ + T) = VC{x{tQ + T)). 

Then, the variation i5(fo + T) at the end of the interval is orthogonal to the gradient of the 
cost since, from (3.5), 

(4.7) VC(x(fo + + T) = A(fo + T)^ + T) = X{toY 5{h) = 0. 

This of course means that any solution [a;(f)^, A(t)^, of the boundary-value prob¬ 

lem satisfies the first-order necessary condition for C to attain a minimum. As in sensitivity 
analyses, the costates A may be interpreted as Lagrange multipliers. 

It is customary to introduce the function H(x, A, u, t) = X~^ f{x, u, t) (pseudo-Hamilton¬ 
ian) so that (4.1), (4.3)-(4.4) take the very symmetric form 

(4.8) ^x = VxH, ^X = VuH = 0. 

dt dt 

4.2. The discrete problem: indirect approach. In the indirect approach, approxima¬ 
tions to the optimal states, costates and controls are obtained by discretisation of the boundary 
value problem (4.1), (4.3)-(4.4), (4.6). Note that we have to tackle a differential-algebraic 
system [22, Chapter VI. 1], with the controls being algebraic variables as {d/df)u does not 
feature in any of the equations (4.1), (4.3)-(4.4). Under suitable technical assumptions (in- 
vertibility of the second derivative of H with respect to u), the system is of index one. 
This means that the constraints (4.4) may be used to express, locally around the solution 
of interest, the algebraic variables as functions of the differential variables, u = 4’(a:, A, t). 
(When applying the implicit function theorem, the relevant Jacobian matrix is the Hessian 
duuH and this will generically be positive definite, if Pontryagin’s principle [41, Section 
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7.2] holds so that H{x, X, •, t) is minimised by <l>(x, A, t).) For a system of index one we 
may think that the right-hand sides of (4.1) and (4.3) have been written as functions of x, 
X and t by setting u = ^{x,X,t), thus transforming the differential-algebraic system into 
a system of ordinary differential equations. In fact the transformed system is the canonical 
Hamiltonian system with Hamiltonian function TL{x, X, t) = i7(x, A, A, because 
the chain rule and VuiJ = 0 imply that, in (4.8), VxH{x,X,u,t) = VxT^ix, X,t) and 
VxH{x, X, u, t) = VxTf (cc, A, t). This Hamiltonian system may be discretised with the PRK 
scheme (2.1), (2.9). (Recall that RK schemes are included as particular cases where both sets 
of coefficients just coincide.) The discrete equations are solved to find the approximations 
Xn and An to x{tn), X{tn) and finally the approximations to the controls are retrieved as 

The analytic expression of the implicit function $ will in general not be available, so 
that it will not be possible to find Ti. explicitly. This is not a hindrance: the approximations 
Xn, Xn, Un that One would get by a PRK integration of the Hamiltonian system may be 
found in practice as solutions of the set of equations (4.9)-(4.16) below, obtained by direct 
discretisation of the differential-algebraic format (4.1), (4.3)-(4.4). The equivalence between 
the two approaches, differential and differential-algebraic is seen by eliminating the controls 
from (4.9)-(4.16), see [22, Chapter VI.l]. 

The discrete equations are (n = 0,..., IV — 1): 


S 


(4.9) 

^n+1 — 

- Xn hn ^ ^ bikn,i-} 
i—1 

(4.10) 

kn,i — 

f {Xn^i 5 Un,ii CiHn) , i — 1, . . . , S, 

(4.11) 

^n,i — 

Xn hn ^ ^ 0,ij kn,j ; ^ 1; ■ ■ ■ ; 

(4.12) 

•^n+l — 

s 

- Xn H” hn ^ ^ 

i—1 

(4.13) 

f ■ — 

^n,i — 

^xf (-^n,Z5 Un,i-) Cihn') -Xn^i^ i — 1, . . . , S 

(4.14) 

^n,i — 

s 

Xn H” hn ^ ^ Xij£n,j j ^ ■ ■ ■ ; 

j—l 

(4.15) 

duf{X 

Un,i-) “1“ Cihn') Xxn^i — 0, % — 1, . . . , S, 

together with {r, 

^ = 0,.. 

.,iV) 

(4.16) 


h^uf i^Xni y-nXn^ Xn — 0; 

and the boundary conditions xq = a, Xn = VC{xn) from (4.6). 


What is the accuracy of this technique? We encounter the same difficulty we found 
in the preceding section: relevant here is the order of the overall PRK scheme rather than 
the (possibly higher) order of the RK coefficients (2.1) used for the state variables. In the 
preceding section the approximations Xn are found independently of the A„ and, accordingly, 
the possible order reduction does not affect them. In the optimal control problem, states and 
costates are coupled and any order reduction will harm both of them. This was first noted 
by Hager who also provided relevant counterexamples, see [17, Table 3]. Hager (Proposition 
6.1) also shows that there is no order reduction for explicit, fourth order RK schemes with 
positive weights. 
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The obvious analogue of Theorem 3.2 holds: the variations in the discrete solution 
Xn satisfy the equations that result from discretising (4.2) with the coefficients (2.1). These 
equations are (3.10) and (3.12) where now 



(A„_i, Zn^i are the stages associated with the variables 5 and Q. 

Assume next that the PRK is symplectic. Recall that symplecticness may be the result of 
choosing the RK coefficients (2.1) (6^ = 1,..., s) for the state variables and retrieving 

from (3.23) the coefficients (2.9) for the integration of the adjoint system. The symplecticness 
of the integrator makes it possible to formulate a discrete analogue of Proposition 4.1. 

Theorem 4.2. Assume that Xn, A„, n = 0,..., N, satisfy the equations (4.9)- 
(4.16) arising from the application of a symplectic PRK method and that, furthermore, (5„, 
n = 0,..., A, (5o = 0, are the variations in Xn- Then, for n = 0,..., iV — 1, 



The PRK scheme may be a symplectic RK scheme or the result of choosing freely the RK 
coefficients (2.1), bi f 0, i = 1,..., s, for the states and then using (3.23) to determine the 
coefficients for the integration of the costates. 

Proof. Use Lemma 2.5 with S{q,p) = This results in 



where kn,i and ^ come from (4.17) and (4.13) respectively. According to (4.5), each of the 
terms being summed vanishes. □ 

When the boundary conditions (4.6) are imposed. 


'^C(xn)^5m = A^^at = AJAo = 0, 


which means that the discrete solution satisfies the first-order necessary conditions for C(a;Ar) 
to achieve a minimum subject to the constraints (4.9)-(4.11) and Xg = a. In this way we have 
proved that symplectic discretisation commutes [29] with the process of forming necessary 
conditions for minimisation: 

Theorem 4.3. Let {Xn}, {A„}, {it„} be a solution of the equations (4.9)-(4.16) arising 
from discretising with a symplectic PRK integrator the necessary conditions for the continu¬ 
ous optimal control problem. Then {x„}, {A„}, {u„} satisfies the necessary conditions for 
C{xn) to achieve a minimum subject to the discrete constraints (4.9)-(4.11) and Xq = a. 
The PRK scheme may be a symplectic RK scheme or the result of choosing freely the RK 
coefficients (2.1), hi f 0, i = 1,..., s, for the states and then using (3.23) to determine the 
coefficients for the integration of the costates. 

When the statesH-costates system is integrated by means of a non-symplectic PRK, x n 
will not satisfy the necessary conditions for C to be minimised subject to the constraints (4.9)- 
(4.11) and xg = a. Therefore non-symplectric PRK discretisations cannot he obtained via 
the direct approach considered next. 

4.3. The discrete problem: direct approach. The direct approach (see e.g. [41, Chap¬ 
ter 9]) based on RK discretisation begins by applying the scheme (2.1) to the differential 
equation (4.1) to get (4.9)-(4.11). Then, these equations and xq = a are seen as constraints 
of a finite-dimensional optimisation problem for the minimisation of C(x 7 v). 
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We use the standard method of Lagrange multipliers based on the Lagrangian in (3.26), 
trivially adapted to the present circumstances by letting / depend on the controls. The method 
leads in a straightforward way to the following result, first proved by Hager [17], see also [4]. 
However [17] does not point out that the relations (3.23) correspond to symplecticness. Fur¬ 
thermore [17] and [4] do not use a discrete Lagrangian obtained by discretisation of the con¬ 
tinuous Lagrangian. These papers and [7] do not point out that the occurrence of symplectic 
schemes in this context is really due to the conservation property (3.5). 

Theorem 4.4. The first-order necessary conditions for the minimisation ofC{xi\[) sub¬ 
ject to xq = a and (4.9)-(4.11), bi 0, i = 1,..., s, are Xg = a, VC(xn) = Xn together 
with (4.9)-(4.15), with the coefficients Aij, Bi, Ci given by (3.23). 

In other words, when the direct approach is used, we arrive at exactly the same set of 
equations for a;„, A„, i, A„ i, i we obtained, with the help of RK technology, via the 
indirect approach in Theorem 4.3. Let us observe that the direct approach does not provide 
‘natural’ approximations to u(tn). Hager [17] suggests to define u„ by locally minimis¬ 
ing H{xn, Xn,u, tn) which leads to (4.16). He also notes ([17], Table 4) that the order of 
convergence of the control stages Un,i may be lower than that in something that it is not 
surprising at all: typically, internal stages are less accurate than end-of-step approximations. 
We remark that, in the direct approach and once the RK method for x has been chosen, the 
minimisation of C implicitly provides the ‘right’ coefficients Aij, Bi, Ci to be used in the 
integration of the costates in order to ensure symplecticness of the overall PRK integrator. In 
the indirect approach those coefficients have to be determined by using the relations (2.16)- 
(2.17) and Theorem 2.4. 

While the direct and indirect approaches may be seen as mathematically equivalent here, 
both have their own interest. The direct approach suggests to solve the discrete PRK equations 
with the help of optimisation techniques and these may be an efficient choice in practice. On 
the other hand, the direct approach ‘hides’ the PRK integration of the costates, a fact that 
may lead to the false impression that the order of accuracy of the overall procedure coincides 
with the order of the RK scheme used to discretise the differential constraint (4.1). This was 
emphasised in [17], where the order of the PRK method (2.1), (2.9), (3.23) is called the order 
of the RK method (2.1)/or optimal control problems. A discussion of the advantages of the 
direct and indirect approaches is not within our scope here, see e.g. [41, Chapter 9], [10]. 

5. Some extensions. We now consider more general optimal control problems. We shall 
need to generalize Theorems 2.1 and 2.4 to the situation where the quantities I or S are not 
constant along trajectories of the system but vary in a known manner. 

5.1. Generalised conservation. Here are simple generalisations of Theorems 2.1 and 
2.4. Only Theorem 5.2 will be proved; the other proof is very similar. 

In order to better understand Theorem 5.1, we may look at the case where y comprises 
positions and velocities of a mechanical system and I is the kinetic energy. Conservation of 
energy demands that the rate of change of I coincides with the rate of change (power) tp of 
the work of the forces. Along each trajectory, the gain in kinetic energy exactly matches the 
total work exerted by the forces. 

Theorem 5.1. Assume that, for the differential system (2.2), there exist a real-valued 
bilinear mapping I in x and a real-valued function p in R^ such that, for each 
solution y(f) 
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and, therefore. 


I{y{to + T),y{to + T)) - I{y{to),y{to)) 


Jto 






If the system is integrated by means of a symplectic RK scheme as in (2.4)-(2.6), then 


N-l s 

I{yN,yN) - I{yo,yo) = ^ /in ^ 

n—0 i—1 

Note that the last sum, based on the RK quadrature weights bi and in the approximation 
y{tn + Cihn) ~ Yn^i, is the ‘natural’ RK discretisation of the corresponding integral. 

Theorem 5.2. Assume that, for the partitioned system (2.10), there exist a real-valued 
bilinear map S in x and a real-valued function tp in x W^, such that for each 

solution 


j^S{q{t),p{t)) = p{q{t),p{t)) 

and, therefore, 


ntQ+T 

S{q{to-\-T),p(^to-\-T)) - S{q{to),p{to)) = / p{q{t),p{f)) dt. 

Jto 

If the system is integrated by means of a symplectic PRK scheme as in (2.11)-(2.I3), then 

N-l s 

S{qN,PN) - S{qo,po) = X] X] ‘PiQn.i, Pn,i)- 

n—Q i—1 


Proof. Use Lemma 2.5 and note that, under the present hypotheses. 


S{kn,i,Pn,i) + S{Qr. 


Pn,i) ^ 


because S{f{q,p,t),p) + S{q, g{q,p,t)) = p{q,p) (cf. the proof of Theorem 2.4). □ 

5.2. Other optimal control problems. Consider first the situation in Section 4, but 
assume that the value a;(fo) is not prescribed. Then Sfo) is free and for (4.7) to hold it is 
necessary to impose the condition A(fo) = 0. This replaces in (4.6) the initial condition 
x{to) = a. The results in Section 4 are valid in this setting after the obvious modifications. 

We next look at the case where (4.1) and a;(0) = a are imposed, but the cost function is 
given by 

/*io+T 

(5.1) C(x(fo+7’))+ / T>{x{t),u{t),t) dt 

J tQ 

(this is often called a Mayer-Lagrange cost [41], as distinct from the Mayer cost C{x{to -\-T)) 
envisaged before). The adjoint system and constraints are, respectively. 
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These are of the form in (4.8) for the pseudo-Hamiltonian H = f + V. 

The conservation property (3.5) is replaced by the generalised conservation formula 

A(fo + T)^S{tQ +T) — A(fo)^iJ(fo) 

rto+T 

+ / 5{t)+ X/u'D{x{t),u{t),tYdt = {), 

to 

which holds for arbitrary d{t), A(t) satisfying the variational equations (4.2), the adjoint 
system and the constraints. After setting S{to) = 0 and A(fo + T) = VC{x{to + T)), the 
generalised conservation formula expresses that the the variation of the cost vanishes, i.e. that 
the first-order necessary conditions for the minimisation hold. 

For a symplectic PRK discretisation of the algebraic-differential system. Lemma 2.5 may 
be used, just as in the proof of Theorem 5.2, to show (the notation should be clear by now); 

N-l s 

xjf Sn ~ aJ(5o + h Un,i, tn + 

n—0 i=l 

+ Un,i, tn + Cihn)^Zn,i^ = 0 . 

By setting Xn = '^C{xn) and Jq = 0, this formula expresses the necessary condition (or¬ 
thogonality between gradient and variantion) for the discrete solution to minimise the discre- 
tised cost 


Af-l s 

C{x]\f) + hV{Xn,^,Un,^). 

n—0 i=l 

Therefore also in this case, results corresponding to Theorems 4.3 and 4.4 hold for a sym¬ 
plectic PRK discretisation. 

It is of course possible to combine the cost (5.1) with alternative boundary specifications. 
If x(to) is not prescribed, then we have to impose A(fo) = 0, as pointed out above. If both 
x(to) = a and x(to + T) = /3 are imposed (in which case the term C(x(to -f T)) may be 
dropped from the cost), then A(fo) and A(fo + T) are both free. 

5.3. Constrained controls. Let us go back once more to the problem in Section 4 and 
suppose that the controls u are constrained so that, for each t, it is demanded that u{t) G U, 
where [/ is a given closed, convex subset of Then (see e.g. [17]), the constraint (4.4) on 
A has to be replaced by 

u{t) G [/, -duf{x{t),u(t),t)^X(t) G Nuiu{t)), 

where Nu{u) is the cone of all vectors w G ’M." such that, for each v G U, (v — u) < Q. 
Proceeding as in Proposition 4.1, we see that now {d/dt)X{tY6{t) > 0 and therefore 

VC(x(fo-f T))T5(io-p T) > 0, 

which is the necessary condition for a minimum in the continuous problem. For a PRK 
discretisation of the boundary value for the statesH-costates system, the relation 

[d/dt)X{ty5{t) > 0 


Z-T A .1 A T n . > n 


implies 
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and therefore we may use Lemma 2.5 yet again to conclude that for symplectic PRK methods 
and if the weights bi are positive. 


yc{xNVSN > 0 . 


Once more, results similar to Theorems 4.3 and 4.4 hold. See [9] for order reduction results. 

6. Lagrangian mechanics. Let us now consider Lagrangian mechanical systems [2]. 
Denote by C{x, u, t) the Lagrangian function, where x S are the Lagrangian co-ordinates 
and u = [d/dt)x the corresponding velocities. According to Hamilton’s principle, the trajec¬ 
tories t I— x{t) of the system are characterised by the fact that they render stationary (often 
minimum) the action integral 



among all curves f i—>■ x{t) witha;(fo) = x{to) andx(fo+7’) = x{ttj+T). This may of course 
be viewed as a control problem to make stationary (or even maximum) the cost (5.1) with 
C = 0 and 2? = —C, subject to the constraint i; = u with fixed end-values a:(fo) anda;(fo+2^)- 
The theory in Section 5 applies. The pseudo-Hamiltonian is iT(a:, A, m, f) = X^u — C{x,u,t). 
The constraint VuH = 0 reads A = VuJ0{x, u, t)\ thus the control costates coincide with the 
mechanical momenta. The elimination of the controls with the help of Pontryagin’s principle 
would determine m as a function ^{x,X,t) by maximising (recall that we are here trying to 
maximise the cost!) the function u H{x, X, u, t). In mechanics, this exactly corresponds 
with the theory of the Legendre transformation as presented in [2, Section 14]: that theory 
shows that, if £ is a strictly convex function of u, then, at given x and t, the velocity vector 
u that corresponds to a given value of the momentum A is globally uniquely defined and 
maximises A^u — C{x, u, t). In most mechanical problems C = T{x, u, t) — V{x, t), with 
T and V the kinetic and potential energy respectively, and T is quadratic, positive-definite 
as a function of u, thus ensuring the required convexity. In control theory the elimination 
of the controls u in the pseudo-Hamiltonian H gives rise to the ‘control’ Hamiltonian 72; 
correspondingly, in mechanics the Hamiltonian is defined as the result of expressing in A^u — 
C{x, u, t) the velocities as functions of the momenta (and x and t). Finally the evolution of 
the states and costates (mechanical co-ordinates and momenta) obeys Hamilton’s canonical 
equations. Hamiltonian solution flows are symplectic and, in this way, we have travelled all 
the way from action minimisation to symplecticness. 

A similar journey may take place in the discrete realm. Choose any RK scheme (2.1) 
with nonzero weights to discretise the differential constraint {d/dt)x = u and minimise the 
associated discrete action 


N-l 


s 



n—0 i—1 


As we know from Theorem 4.3, this direct approach implies a symplectic PRK integration of 
the Hamiltonian system for x and A, where the A equations are integrated with the coefficients 
(2.9). This is nothing more than the variational construction of PRK symplectic integrators, 
already presented in the early paper [40] by Suris (see [26] for more information on integra¬ 
tors based on the principle of least action, cf. [23]). In this way, Hager’s result [17] may be 
viewed as an extension of Suris’s work to general control problems. 
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7. What is the adjoint of a Runge-Kutta method? Reflecting and transposing co¬ 
efficients. In this section we examine the relations between the preceding material and the 
notion of the adjoint of an RK method. 

Scherer and Tiirke [35] associated with the set of RK coefficients (2.1) two new sets 
called the reflection and the transposition of the original. The reflected coefficients are given 
by (z,j = 

dj^j — hj O-ij ^ hj^ — 6^, Cj^ — 1 C-i 

and the transposed coefficients are defined, only for methods with nonzero weights hi, by 

= bjaji/bi, b\ = bi, c■ = 1 - Ci. 

The operations of reflection and transposition commute; the transposition of the reflection 
coincides with the reflection of the transposition as both lead to 

Qij = bj — bjttji/bi, bl* = bi, Ci* = Ci. 

Furthermore both operations are involutions: each is its own inverse. 

The paper [35] introduces the operations of reflection and transposition as algebraic ma¬ 
nipulations that make it possible to interrelate important families of RK methods; no attempt 
is made there to interpret computationally the meaning of integrating with the reflected or 
transposed coefficients. What do reflection and transposition mean? The interpretation of re¬ 
flection is well known [34, Section 3.6], [21, Chapter II, Theorem 8.3]; a step of length —hn 
with the reflected RK method inverts the transformation i-A Vn+i induced by a step of 
length hn with the original method. In this paper we have seen this idea at work when mov¬ 
ing from (3.14)-(3.16) to (3.17)-(3.19). The formulas (3.23) provide meaning to the idea of 
transposition: to construct a symplectic PRK out of a given RK method with nonvanishing 
weights the p coefficients are determined by reflecting and transposing the given q coeffi¬ 
cients. The transposed of the q coefficients are then those required to integrate backwards the 
p equations in, say, sensitivity analyses. 

As a further illustration of these ideas, consider the linear non-autonomous system 

= M {t)q, ^p = -M (f) V, 

integrated with the PRK method (2.1), (2.9) (this is a Hamiltonian system). Since p and q are 
uncoupled, this amounts to an RK integration of the q equations with the coefficients (2.1) 
together with an RK integration of the p equations with the coefficients (2.9). The system 
has the invariant Theorem 2.4 ensures that it will be preserved if the p coefficients are 
the transposition of the reflection of the g coefficients. Both sets of coefficients only coincide 
if g itself is integrated symplectically. If we wish to preserve the invariant, a nonsymplectic 
integration of g is possible, but then one has to compensate by integrating the p equations 
in an appropriate way and the order and stability of the p integration have to be investigated 
separately. Again, if the p equations are integrated backward in time, then, preservation 
of q^p requires that such backward integration be performed with the transposition of the 
coefficients used to propagate g forward. 

We conclude this section with a remark on terminology. Monographs such as [19] and 
[34] use the word adjoint to refer to the method with reflected coefficients. Section 3 and our 
last comments suggest that, in order to proceed as in the differential equation case, it would 
have been better to keep the word adjoint for the reflected and transposed method. And call 
reflected to what in [19] or [34] is called adjoint. With that alternative terminology, for RK 
schemes, symplecticness would simply be self-adjointness. 
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8. Conclusion. Symplectic RK and PRK schemes preserve, by definition, the symplec- 
tic form in phase space; in addition, they may be characterized as those RK or PRK integra¬ 
tors that exactly preserve each quadratic invariant of the system being integrated. In sensi¬ 
tivity analysis, optimal control and other areas, adjoint systems are introduced and possess 
paramount importance; these adjoints are defined so as to preserve the key quadratic invariant 
(3.5). Therefore, there are tight connections between those areas and the theory of symplectic 
integration; we hope the present paper has helped to understand those connections. 
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Appendix: Schemes with some vanishing weights. If one or more weights bi in (2.1) 
vanish, then it is not possible to use the recipe (3.23) to define the coefficients required to cre¬ 
ate a combined symplectic PRK method (2.1), (2.9). Given the partitioned system (2.10) and 
the q coefficients ( 2 . 1 ), how to integrate the p equations so as to have a symplectic scheme? 
The solution to this problem is rather weird and it is best to begin with the simplest example. 

Let us study the second-order scheme (due to Runge in his 1895 original paper [21, 
Section II. 1]), s = 2, 

(8.1) 011 = 021 = 022 = 0, 012 = 1/2, bi = 1, &2 = 0, Cl = 1/2, C 2 = 0. 

While it is customary to label the stages so that the abscissas d increase with i, we have de¬ 
parted from this practice; if we adopted it, formula ( 8 . 6 ) below would get a rather disordered 
appearance. 

We regularise the zero weight and consider the one-parameter family, e 7 ^ 0; 

(8.2) 011 = 021 = 022 = 0, 012 = 1/2, hi = 1, &2 = e, ci = 1/2, C2 = 0. 

(The regularised scheme is not even consistent, but this does not hinder the argument.) From 
(3.23), we set 

(8.3) All = 1, Ai2 = A 22 = e, A 21 = 1 - l/(2e), Bi = 1, B 2 = e, Ci = 1/2, C 2 = 0. 
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Thus, the PRK specified by (8.2)-(8.3) is symplectic for each e. The idea now is to take limits 
as e —0; the limit integrator, if it exists, will preserve quadratic invariants and, when applied 
to Hamiltonian problems, the symplectic structure. The difficulty is that from the equation 
that defines 2 

Pn,2 — Pn Pn,l: in ^n/2) “t“ p(Qn,2 1 Pn,2 5 in') 

we may expect that, for fixed q^, Pn, the stage vector 2 grows unboundedly as e 0 

and that, therefore, a limit integrator cannot be defined. However, the stage 2 only affects 
P„ 1 and Pn+i through the small coefficients A 12 = B 2 = e, and this makes it possible to 
prove that the limit scheme exists for some particular differential equations. Specifically, we 
assume in the remainder of this section that in the partitioned differential system (2.10) being 
integrated, / and g have the special form 

(8.4) f = f[q^f) g = L{q,t) + M{q,t)p 

(with q = X, p = X, this format includes the system (3.1), (3.3) in Section 3). When (8.4) 
holds, the q integration with coefficients (8.2) converges, as e —>• 0, to the integration with 
the originally given coefficients (8.1). The system for the p stages Pi, P 2 (the index n is 
sometimes dropped to shorten the formulas) may be written as 

Pi = Pn p hn{Ll + PIiPi) + hn{eL 2 + hnM 2 Tn 2 ), 

^2 = + (^ “ ^) (-^1 + ^iPl) + + hnM2m2), 

where we have scaled m 2 = (e//i„)P 2 to avoid blow-up and used the abbreviations 

Li = L{Qi, tn + hn/2), Ml = M(Qi, tn + hnl2), 

L2=L{Q2,in), AI 2 = M[Q2,in)- 

Now take limits as e —>■ 0, to get 

Pi = Pn P hn{Li MiPi) -\- ll^M2m2, 

m 2 = -^{Pi P MiPi)- 

Since Bi = An and B 2 = A 12 , the end-of-step approximations is given by p„+i = Pi. 

We write these equations in a way similar to (2.11)-(2.13); 

(8.5) p„+l =PnP hn£l P hlM2m2, 

^1 = g{Qlj Plfn p hnl2), 

M 2 = M{Q2,in), 

Pi =PnP Kil P hlM2m2, 

m 2 = -2^1- 

The combination of these formulas for p with the scheme (8.1) for q is first-order 
integrator that conserves quadratic invariants as in Theorem 2.4 and, for Hamiltonian prob¬ 
lems, preserves the symplectic structure. Of course the integrator is not a PRK method; 
since M = dpg, the formula (8.5) is reminiscent of Runge-Kutta methods that use higher 
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derivatives of the solution [21, Section II. 13]. (Such high-order derivative methods cannot be 
symplectic for general problems [20].) Note that, while is an approximation to the first 
derivative {d/dt)p, the vector M 2 TO 2 has the dimensions of the second derivative {d?/dt^)p. 

Let us now turn to the general case. Assume that in (2.1) the first r weights 61 , ..., 
do not vanish, while hr+i = ■ ■ ■ = bg = 0- The regularisation procedure used for Runge’s 
method leads to the fancy integrator: 

r s 

Pn-\-l — Pn hji ^ ^ biii ^ ^ 

i—1 Q:=r +1 

Pi=Pn + K'^ {bj - 

J=1 

P=r+1 * 

r s 

ITla = —''^bjUjaij - hn ^ apaMpTUp, a = r + 1,..., s. 

3 = 1 p=r+l 

Here the r vectors li are as in (2.12), so that the method uses r slopes and additionally s — r 
matrices Ma = M{Qa,tn + Cahn). From the relations ( 8 . 8 ) the nia may be viewed as 
functions of the ii. 

The following result is a consequence of the construction via regularisation: 

Theorem 8.1. Consider partitioned systems of the special format (8.4), where the q 
equations are integrated with the RK scheme (2.1), bi 0,..., br 0, fer+i = ■ ■ • = bg = 0, 
and the p equations with the formulas in (8.6)-(8.8). If S(q(t),p{t)) is a conserved quantity 
as in Theorem 2.4, then S(qn,Pn) A independent ofn. If the system is Hamiltonian, then the 
map (qn,Pn) '-t (qn+i.Pn+i) is symplectic. 

With the terminology of Section 7, for systems of the special form (8.4), the scheme ( 8 . 6 ) 
may be viewed as the reflected and transposed of ( 2 . 1 ) when this possesses one or more zero 
weights. 

Proofs of Theorem 8.1 that do not rely on taking limits as e —0 are of course possible. 
For such an alternative proof of the conservation of S, we may note that manipulations (not 
reproduced here) similar to those used to prove Lemma 2.5 show that for the present method, 
in lieu of (2.18), we may write: 


( 8 . 6 ) 

(8.7) 

( 8 . 8 ) 


S(qn+l 7 Pn+1) S(qn , Pn) — ^n ^ ^ (k^, Pi ) -f S(Qij 

i=l 

s 

+ hn E , nia) + S(Qa, Mama)). 

This is an algebraic identity that does not require that the system integrated to be conservative. 
When S is conserved, the first sum vanishes as in the proof of Theorem 2.4. For the second 
sum note that from S(f(q,t),p) + S(q,L(q,t) + M(q,t)p) = 0 it follows that S(f,p) + 
S{q,Mp)=0. 

For the adjoint equations in Section 3, the conclusion of Theorem 3.4 holds if the x equa¬ 
tions are integrated with a (nonsymplectic) RK method with one or more vanishing weights 
and the A equations are integrated as in (8.6)-(8.8). Similarly Theorem 3.6 holds for a suitable 
choice of the Lagrangian (details will not be given, but see below). 
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What is the situation for the control problem in Section 4? Recall that the corresponding 
system of differential equations is given by (4.1), (4.3), where, in the right-hand sides, u has 
been expressed as m = 4)(a;, A, t). That system of differential equations does not possess the 
format (8.4) for which ( 8 . 6 ) makes sense and, accordingly, we cannot provide analogues to 
Theorems 4.2 and 4.3. 

In order to gain additional insight, let us use the direct approach based on Runge’s second 
order integrator (8.1). We dehne the Lagrangian (compare with (3.26) and note consistency 
with (3.6) due to the factor h^): 


N-l 


C(^Xn') Aq (xq Oi) ^ ^ h^Ayj_j_2^ (Xyj+i Xn') kn,l 

r, ^ 

n—0 

N-l 

^ ^ hn-^n ^n,l f {^n,l i i / 2 ) 

n—0 
N-l 

^ ^ ^nMn ^n,2 f (-^n,2; Un,2 7 ^n) 


n—Q 


where, as on other occasions, the stages Xn,i = Xn (/in/ 2 )A:^, 2 , Xn ,2 = must be seen 
as known functions of Xn and 2 - Taking gradients with respect to Xn, kn ,2 leads to 

the necessary conditions 


•^n+l — '^n 7 ^n,l7 An Ayj 

- hl{d^f{Xn,2, Un,2, 


- Ayj + l, 

d'n — 2 “t” ^n/2)) 


which clearly correspond to the integrator (8.5). (By considering the case where / is inde¬ 
pendent of u, this shows that Theorem 3.6 holds in this case.) However, taking gradients with 
respect to Un,i and [/„ 2 yields 

{duf{Xn,l,Un,l,tn + h„/2))'^A„ = 0, {duf{Xn,2,Un,2,tn)Vltn = 0. 


The second equation is totally meaningless. It cannot be seen as a discretisation of (4.4) 
because is not an approximation to the costate A; it does not even possess the right di¬ 
mensions for that to happen. The values of Un ,2 retrieved from this constraint will have no 
relation to the true optimal controls. The paper [17] nicely illustrates this with an example 
(see also [9]). 

Since the trouble arises by the presence of the controls, things may be hxed by tamper¬ 
ing with ?7„ 2 j as pointed out in [17], [9]. However, there is no shortage of RK schemes 
with nonzero (or even positive) weights, so that, in practice, resorting to such hxes seems ill 
advised. 








